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We have studied the effect of memory on evolution of the prisoner's dilemma game using square 
lattice networks. Based on extensive simulations, we found that the density of cooperators was 
enhanced by an increasing memory effect for most parameters. However, we also observed that the 
density of cooperators decreased with an increased memory effect in the case of a large memory 
and moderate temptation. It is interesting to note that memory makes cooperators immune from 
temptation. The strength of protection reaches its maximal value only for the moderate memory 
effect. 

PACS numbers: 02.50.Le, 05.50.+q, 64.60.Ht, 87.23. Ge 



I. INTRODUCTION 

The evolutionary prisoner's dilemma game (PDG) 
has attracted substantial attention over the past few 
decades In this game, two agents must simultane- 
ously select one of two strategies: cooperation or defec- 
tion. The prisoners receive payoffs that are dependent 
on their choices. A selfish agent will adapt their strategy 
to maximize their payoff. Game theory involves the con- 
struction of many types of models and analysis of these 
models using varied parameters. Therefore, game theory 
serves as a powerful metaphore for simulation of the inter- 
actions between individuals in many domains, including 
biology, economy, and ecology. 

In the PDG, mutual cooperation generates the highest 
return for the community. However, the Nash equilib- 
rium state is mutual defection because defection is a bet- 
ter choice for the prisoner, regardless of the strategy of 
the other prisoner. Importantly, in the real world, mutual 
cooperation is the most commonly utilized strategy. Sys- 
tems such as the PDG are considered to be an important 
tool for study the emergence of cooperative behavior be- 
tween selfish individuals 0, H, Q ■ Nowak and May 0] in- 
troduced a spatial prisoner's dilemma game (SPDG) con- 
sisting of a two-state cellular automaton. In the general 
SPDG, the agents in the game play the PDG with their 
network neighbors and get payoffs according to a payoff 
matrix. The total payoff of each agent is the sum of all 
payoffs in this step. An agent may then mimic his neigh- 
bor's strategy by comparing his payoffs in this step with 
his neighbor's payoffs. An important conclusion is that 
spatial structure can promote the persistence of coopera- 
tion. Because the interactions of an agent are limited to 
his local neighbors, PDG models have been extensively 
explored in the past few years P, 0, 0, [1| • In addition to 
spatial structure, there are several mechanisms that may 
facilitate the emergence and persistence of cooperation 
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among populations. Hamilton found that kin selection 
can favor cooperation [91]. Axelrod's model demonstrated 
that the tit for tat strategy could sustain cooperation in 
systems of all players playing the game together. The 
simulation performed by Szabo, Vukov, and Szolnoki pro- 
vided evidence that noise and irrational choices affect the 
maintenance of cooperative behavior [To| . 

In the traditional SPDG model, the changing prob- 
ability of strategy is determined by the agents' perfor- 
mance on one step. In other words, people assume that 
the agents are shortsighted and forgetful. In fact, when 
people make an important decision, they generally con- 
sider the current situation and their experiences. There- 
fore, the effect of memory should be taken into account. 
Historical memory plays a key role in the evolutionary 
game [ll| . The purpose of this paper is to evaluate 
whether memory enhances the density of cooperators and 
detracts the cooperators from the temptation. We ob- 
served the maximum value of critical points from a ho- 
mogeneous cooperator to a mixed state of cooperator and 
defector. 

In this paper, we consider an evolutionary SPDG with 
the memory effect in a square lattice, in which players 
update their strategy by considering previous payoffs. 
The rules of the game are explained in section II. The 
simulations, which are detailed in section HI, show that 
the evolution of SPDG depends on the magnitude of the 
memory effect and payoff-matrix elements. Conclusions 
are drawn in the last section. 



II. MODEL 

In the traditional PDG, there are two players. Each 
player choses one of two strategies: cooperator (C) or 
defector (D). There are four combinations for the two 
players: (C, C), (C, D), {D, C), and [D, D), which 
corresponded to payoffs {R, R), {S, T), (T, S), and (P, 
P). The rewards or punishments for each player can be 
tabulated as 2 x 2 payoff matrices (see Tabic |l|. 

Four elements in the payoff matrix satisfy the order 
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TABLE L The payoff matrix of prisoner s dilemma game. 



playerl\player2 
C 
D 



C 
R\R 
T\S 



D 

S\T 
P\P 



1.0 



ranking T > R > P > S and the additional constraint 
T + S < 2R for repeated interactions. As suggested by 
Nowak and May [5|, the parameters in this paper are 
R = 1, T = b, S = 0, and P = 0. Our model preserve the 
essentials of PDG and b is the only tunable parameter. 

Our study is based on systematic Monte Carlo (MC) 
simulations on a square lattice network with periodic 
boundary conditions. When we applied the PDG on the 
network, the players were located on the nodes. In ev- 
ery MC step, the players simultaneously play the PDG 
with their network neighbors (only the first neighbor- 
hoods) and themselves. The sum payoff of each player is 
the sum over all games. The evolutionary process is gov- 
erned by strategy imitation. In every MC step, all agents 
may mimic their neighbors strategy. Player i adopts a 
(randomly chosen) neighbor's strategy (at site j) with a 
probability that depends upon the payoff difference: 
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FIG. 1: (Color online) Density of cooperators fc as a function 
of the payoff parameter 6 with various memory factors r. 



III. SIMULATION RESULTS 



I+CXp[(^„,(i)-^™(j))/K]' ' ' 

where k indicates the noise generated by the players 
allowing irrational choices [l^ . [isj . In this work, we use 
K = O.I for all simulations. Em{i) and Em{j) are the 
total payoffs which contain the sum payoffs at this MC 
step U and the cumulative historical payoff. For each 
node z, there are two memories Mc{i,t) and AId{i,t) at 
step t. When node i is associated with the strategy C 
and the sum payoff at this MC step is U, 

Ern{i) = U + Mc{l,t) 

Mc(^,^+l) = iM,{i,t) + U)*T 
Mait + 1) = Md{t)*T, (2) 

for this time step. When the node i is associated with 
strategy D 

Em{l) = U + MS,t) 

MS,t+l) = {U + Md(i,t))*T. (3) 

Here, t is the memory factor and < r < I. Mc{i,t) 
and Md{i,t) represent the historical payoffs of C and D, 
respectively. The memory effect for each MC step de- 
clines with time. In other words, memories of the payoffs, 
Mc{i,t) and Md{i,t), will be forgotten as time passes, 
r = indicates that there is no memory effect. As r 
ncars to I, there exists an almost perfect memory effect 
in the model. Starting from a random initial state with 
an equal fraction of C and D and Mc{i, 0) = Md{i, 0) = 0, 
we iterate the model with a synchronized update. 



Our simulations are carried out by varying b and r. 
The results described in this paper are obtained from 
MC simulations with a system size of 200 x 200, with the 
exception of the results shown in Fig. 21 It is true that a 
network with larger size will decrease the ensemble error, 
which is caused by the finite scale of networks. We have 
simulated our model with 100 x 100 and 400 x 400. There 
is not conspicuous difference among these networks. The 
results in this manuscript are the average of 20 trials 
with various random seeds. Repeating simulations with 
different random seeds can also reduce the error. There- 
fore, the 200 X 200 is large enough. The transient time is 
varied from 20, 000 to 80, 000 MC steps. After the tran- 
sient state, the system reached the stable state, and the 
amplitudes of population fluctuations were considerably 
smaller than the corresponding average value. 

To characterize the macroscopic behavior of the sys- 
tem, we measure the density of C fc first. Fig. [T] shows 
the fc on the square lattice as a function of b for sev- 
eral values of t. We find that there are two thresholds 
of temptation b. When b < bd, the networks in which 
only C can survive are in the stable state. The den- 
sity of C decreases monotonously with increasing values 
of b for b > bci- We upload the movies which presents 
how the systems with network size of 240 x 240 evolve 
in 300 MC steps after transient time on different 6 and 
T |lj]. In these movies Cs are presented by black boxes 
and Ds are presented by red boxes. It is observed that 
the agents who utilize the same strategies join together 
to form complex patterns that continuously move and 
change shape. These patterns develop because agents 
change their strategies by learning from their neighbors. 
Furthermore, the Cs who join together are more stable 
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FIG. 2: (Color online) foci as a function of memory factor r 
on the square lattice. The data points depicted by squares 
(black) are the result of MC simulations, and the data points 
depicted by triangles (red) were derived from Eq. (|6]). 



because they support each other by earning payoff from 
their C neighbors. For b > bc2, C strategies die out. Both 
the memory factors t and k affect the critical point [13] ■ 
Recently in Ref. Szabo, Vukov and Szolnoki draw 
a K — b plane of Newman- Watts networks. In contrast 
to r, K does not conspicuously affect bci or 6c2 in this 
model. The main focus of this paper is to evaluate how 
the memory effect r affects the density of C and 6ci • De- 
termination of K — bci and k — bc2 is beyond the scope 
of this paper. 6c2 increases with t monotonously; how- 
ever, bci reaches its maximum value near r — 0.72, and 
bci tends toward 4/3 as r approaches or 1 (see black 
squares in Fig. [2]). Fig. [H we find the memory effect en- 
hances the density of C in most cases; however, the den- 
sity of C decreases with increasing of r only for r > 0.72 
and 1.75 < 5 < 1.8. It should be noted that our simu- 
lations are consistent with those presented in Fig. 1 of 
Ref. fl^l for T = despite the fact that Szabo and Toke 
used the asynchronized update law in their model. The 
mean-field results for six-point approximations [lB| agree 
with the simulation in 11^ and our model in the case of 
T — 0. We assume that the six-point approximation in- 
cludes the main features of the two models. Importantly, 
the six-point approximation does not contain a restric- 
tion of the update law. Therefore, it is conjectured that 
the synchronized update does not play an important role 
in the two models. 

In comparision to the case of r = 0, wc know that 
enhancement of the density of C is caused by Mc and 
Md- From the above-mentioned definitions, the Mc and 
Md of one node are determined by two factors: (1) the 
payoff income U of every MC step and (2) whether the 
node maintains one strategy. Mc or Md is aggravated 
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FIG. 3: (Color online) The average payoffs for strategies C 
and D as a function of the payoff parameter fo for several 
values of the memory factor r. 



if the node persists in C or Z? respectively. Fig. [3] plots 
the average Mc and Md of all nodes as a function of b. 
It should be noted that Mc is always larger than Md- 
Therefore memory effect almost always enhances fc in 
this model. For b < bci, the networks include only C. 
Every node can receive payoffs 5 at every MC step and 
Mc is 5*r/(l — r). Then, with an increase of 6, the emer- 
gence of D reduces the value of C"s payoff for every MC 
step and decreases the continuous accumulation of Mc- 
As a result, Mc gradually decreases with b until C dies 
out and Mc = 0. In contrast to Mc, Md has a peak in the 
C-D coexistent states. When D is outside of the mixed 
region, Md is equal to 0. D earns a payoff only by playing 
the game with C. Therefore, Aid is not equal to in the 
C-D coexistent region bd < b < bc2- When b is little bit 
larger than bd and 1 — fc ^ 1, forms small isolated 
gangs. As discussed in |12l | , the behaviors of D gangs are 
considered as branching and annihilating random walk- 
ers [13, [Hi- The D gangs undergo four basic processes: 
random walk; an annihilation reaction (two D gangs can 
unite); death (one gang of D will die due to the irra- 
tional choice); and branching (one gang of D can divided 
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FIG. 4: The critical exponent /3 as a function of t of 6ci. The 
error bar in the figure presents the standard deviation. In 
order to suppress the statistical error in the critical regions, 
we use the system size 600 x 600 for r < 0.6, 800 x 800 for 
0.75 > r > 0.6 and 1000 x 1000 for r > 0.75. 



into two gangs). Every D gang that obtains the highest 
payoff at every MC step is surrounded by cooper ators. 
However, the density of D is low, and random walking 
breaks the continual accumulation of Md- Therefore, Md 
is small. When D is dominant, the random walking of 
C gangs does not deplete the accumulating of Md but 
the average payoff of D decreases at each MC step. Thus 
Md is maximized when there is a compromise between 
the average payoff at each MC step and continual accu- 
mulation of Md- 

In [in, the authors discussed the critical expo- 
nent of bci and bc2- Their MC simulations indicated 
a power-law behavior, namely fc oc {bc2 ~ b)^ and 
1 — /c oc (6 — bci)^ , and the values of (3 agreed with the 
directed percolation (DP) exponent. Grassberger and 
Janssen conjectured that all one-component models with 
a single absorbing state belong to the universality class 
of (DP) [l^. The value of critical exponents should be 
independent of the details of dynamical rules and depen- 
dent on the spatial dimension. In this paper, we investi- 
gated these exponents in the context of different values of 
T. Fig. |4] shows that (3, which ranged from 0.47 (t = 0) 
to 1.10825 (t = 0.9) is monotonously increase with t. 
Therefore, the value of the critical exponent is not uni- 
versal but depends on the memory factor r in this model. 

Considering that a persistent unchanged strategy at 
one site leads to the accumulation history payoff, we in- 
vestigated the mobility of spatial patterns of r. Pop- 
ulation mobility is a central feature of real ecosystems: 
animals migrate, bacteria run and tumble. Similar phe- 
nomena can be observed in a rock-paper-scissors game 
[20| . Reichenbach, Mobilia, and Frey, observed that mo- 
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FIG. 5: (color online) (a) Time autocorrelation function of 
strategy for several values of r. (b) The characteristic time 
p(r) as a function of r. The red line is the fitting result of 
this figure p = 4.29 * (1 - r)"^'^^ 



bility critical influence on species diversity. In this model, 
we find that the behavior of bd is caused by the decrease 
in strategy mobility. This means that C resists tempta- 
tion b by decreasing mobility. Therefore, we introduce 
the time autocorrelation function of strategy: 



7(r,6,i) = (s,(0)s,(i)) , 



(4) 



where Si{t) is the strategy of player i at MC step t. 
When player i chose C, Si{t) — 1. In contrast, Si{t) = 
— 1 for D. denotes an average over all nodes in the 
networks. Considering that g{T, b, t) can be affected by 
density of C and in order to ensure that g{T, 6, t) ranges 
from to 1, we chose b such that fc = 0.5. This definition 
describes whether the node's recent strategy correlates 
with its strategy at later t MC steps. 

Fig. OJa) displays the attenuation oi g{T,t) f^=o,5 with 
time. It was found that 3(t, t)/^=o.5 fits with the form 
g{T,t) — exp(— t/p(T)). One can regard p as the 
characteristic residence time of the unaltered strategy. 
We define th as the number of MC steps for which one 
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The behavior of a stable D is subtle for r > 0. Based 
on the discussion above, the increase of th with t and th 
determines the player's memory and total payoffs. There- 
fore, we can use th to approximate bd- When the th of 
the D-D pair is N and we neglect the remnants Md, 
which accumulated many MC steps ago and assume that 
neighbor C can remain as C indefinitely because of the 
dominance of C at 6 = 6ci, we find that 



bci 



3(1 -tr^) 



(6) 



FIG. 6: Illustration of the D-D pair (nodes A and B) and 
neighbor C (node C). The black circle and white circle denote 
D and C, respectively. D-D pair indicates that both nodes 
are connected in the networks by strategy D. 



In Fig. [51 we plot the results from Eq. ^ which are 
similar to the simulation results. We use A — 27 and 
B = 0.63 in Eq. ^. 



IV. CONCLUSION 



strategy was maintained and assume that the character- 
istic residence time p and th have a similar ratio: 

th = p/A + B, (5) 

Fig. \5lh) shows p as a function of the parameter t. 
There is a critical behavior at p oc (1 — t)~^, where the 
exponent is z = 2.22 with standard deviation 0.043. 

Now we focus our attention on the behavior of bd- 
When b > bci, C cannot resist temptation b and D ap- 
pears. Therefore, bd can be regarded as the ability of the 
model to protect C. As described in the discussion above, 
the D gangs undergo four basic processes. When b = bd 
and 1 — /c <C 1, the annihilation process is rare, while 
the death and branching processes are major activities. 
Therefore, D gangs become stable if the branching rate 
is greater than the death rate. We found that the sin- 
gle D in the branching process will have an offspring and 
form D-D pairs (as shown in Fig. [5]). The D-D pair plays 
an important role in the branching process of D gangs. 
When we discard the effect of noise, the total payoff 
of each player in D-D pairs (nodes A and B in Fig. ^ 
must be larger than the payoff of his neighbor C (node 
C in Fig. [S]). Otherwise, the D gangs will eventually die. 
For example, in the case of t = 0, the total payoff of 
each player in D-D pairs is 35, and the total payoff of his 
neighbor C is 4. Therefore, the threshold for a stable D 
is bd = 4/(36) = 4/3. We suggest that the deviation of 
4/3 , which was obtained our simulations was caused by 
noise. 



In this paper, we studied the ability of memory to pro- 
tect C from D in an evolutionary PDG in a square lat- 
tice networks. With an increase in the effect of memory, 
there is an increase in the density of C in most cases. 
In compution of the autocorrelation function, we used 
the characteristic residence time to measure the mobil- 
ity of a spatial pattern. We also found that the mobility 
of a spacial pattern decreases with the memory effect. 
Decreasing mobility induces a maximum value of criti- 
cal coexistence point bd at t = 0.72. It is obvious that 
mobility plays an important role in this model. The ef- 
fect of memory on cooperative behaviors may draw some 
attention in evolutionary games. 

We have also applied this model to the Newman- Watts 
small- world (NWSW) networks The NWSW net- 

work is a two dimension small- world network. We found 
that moderate long range links did not have an obvious 
qualitative influence on our model. 
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